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Abstract 

We propose and study a row-and-column affine measurement scheme for low- 
rank matrix recovery. Each measurement is a linear combination of elements in 
one row or one column of a matrix X. This setting arises naturally in applications 
from different domains. However, current algorithms developed for standard ma¬ 
trix recovery problems do not perform well in our case, hence the need for devel¬ 
oping new algorithms and theory for our problem. We propose a simple algorithm 
for the problem based on Singular Value Decomposition (SVD) and least-squares 
(LS), which we term SVLS . We prove that (a simplified version of) our algorithm 
can recover X exactly with the minimum possible number of measurements in the 
noiseless case. In the general noisy case, we prove performance guarantees on the 
reconstruction accuracy under the Frobenius norm. In simulations, our row-and- 
column design and SVLS algorithm show improved speed, and comparable and 
in some cases better accuracy compared to standard measurements designs and al¬ 
gorithms. Our theoretical and experimental results suggest that the proposed row- 
and-column affine measurements scheme, together with our recovery algorithm, 
may provide a powerful framework for affine matrix reconstruction. 

Keywords: low-rank matrix recovery, row and column measurements, matrix comple¬ 
tion, singular value decomposition 

1 Introduction 

In the low-rank affine matrix recovery problem, an unknown matrix X G Km xn2 with 
rank{X) = r is measured indirectly via an affine transformation A : K„jxn2 
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and possibly with additive (typically Gaussian) noise z € K.“. Our goal is to recover 
X from the vector of noisy measurements b = A{X) + z. The problem has found 
numerous applications throughout science and engineering, in different helds such as 
collaborative hltering 119|, face recognition 111, quantum state tomography 114| and 


computational biology 191. The problem has been studied mathematically quite ex¬ 
tensively in the last few years. Most attention thus far has been given to two particu¬ 
lar ensembles of random transformations A', (i) the Matrix Completion (MC) setting, 
in which each element of A{X) is a single entry of the matrix where the subset of 
the observed measurements is sampled uniformly at random 17, 18, 21] (ii) 


Gaussian-Ensemble (GE) affine-matrix-recovery, in which each element of A{X) is 
a weighted sum of all elements of X with i.i.d. Gaussian weights |3,13l- Remark¬ 
ably, although the recovery problem is in general NP-hard, when r <C mm(ni,n 2 ) 
and under certain conditions on the matrix X or the measurement operator A, one can 
recover X from d <C nin 2 measurements with high probability and using efficient al¬ 
gorithms flli However, it is desirable to study the problem with other affine 

transformations A beyond the two ensembles mentioned above for the following rea¬ 
sons: (i) In some applications we cannot control the measurements operator A, and 
different models for the measurements may be needed to allow a realistic analysis of 
the problem (ii) When we can control and design the measurement operator A, other 
measurement operators may outperform the two ensembles mentioned above with re¬ 
spect to optimizing different resources such as the number of measurements required, 
computation time and storage. The main goal of this paper is to present and study a 
different set of affine transformations, which we term row-and-column affine measure¬ 
ments. This setting may arise naturally in many applications, since it is often natural 
and possibly cheap to measure a single row or column of a matrix, or a linear combi¬ 
nation of a few such rows and columns. Eor example, (i) In collaborative hltering, we 
may wish to recover a users-items preference matrix and have access to only a subset 
of the users, but can observe their preference scores for all items (ii) When recover¬ 
ing a protein-RNA interactions matrix in molecular biology, a single experiment may 
simultaneously measure the interactions of a specihc protein with all RNA molecules 

S. 


In general, we can represent any affine transformation A in matrix representation 
A{X) = Avec{X), where vec{X) is a column vector obtained by stacking all columns 
of X on top of each other. In our row and column framework the measurement operator 
A is represented differently using two matrices , A^‘^'1 which multiply X as a 
matrix (rather than multiplying the vector vec{X)) from left and right, respectively. We 
focus on two ensembles of A^^'l , : (i) Matrix Completion from single Columns 

and Rows (RCMC). Here we observe single matrix entries in similar to standard matrix 
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completion case, however the measured entries are not scattered randomly along the 
matrix, but instead we sample a few rows and a few columns, and measure all entries in 
these rows and columns. This ensemble is implemented by setting the rows (columns) 
of yl(^) as random vectors from the standard basis of (R"^). (ii) Gaussian 

Row-and-Column (GRC) measurements. Here each set of measurements is a weighted 
linear combination of the matrix’s rows (or columns) with the weights taken as i.i.d. 
Gaussians. This ensemble is implemented by setting the entries of A ^^'>, as i.i.d. 
Gaussian random variables. 

The measurement operators A in our RCMC and GRC models do not satisfy stan¬ 
dard requirements which hold for GE and MC. It is thus not surprising that algorithms 
such as nuclear norm minimization which succeed for the GE and MC models, 

fail in our case, and different algorithms and theory are required. However, the spe¬ 
cific algebraic structure provided by the row-and-column measurements, allows us to 
derive efficient and simple algorithms, and to analyze their performance. In addition, 
we provide extensive simulation results, which demonstrate the improved accuracy and 
speed of our approach over existing measurement designs and algorithms. All of our 
algorithms and simulations are implemented in a Matlab software package available at 
https://github.com/avishaiwa/SVLS 


1.1 Prior Work 


Before giving a detailed derivation and analysis of our design and algorithms, we give 
an overview of existing designs and their properties. We concentrate on two prop¬ 
erties: (i) storage required in order to represent the measurement operator, and (ii) 
measurement sparsity, defined as the sum over all measurements of the number of ma¬ 
trix entries participating in each measurement, that is S'(,4) = ||'(;ec(A)||o. The latter 
property may be related to measurement costs, as well as to computational time. 

In the Gaussian Ensemble model, the entries of the matrix A in the matrix represen¬ 
tation ,A(Ar) = Avec{X) are i.i.d. Gaussian random variables, Aij ~ W(0,1). Eorthis 
ensemble, one can recover uniquely a low rank matrix X with 0(r(ni + ^ 2 )) noiseless 
measurements using nuclear norm minimization or other methods such as Sin¬ 

gular Value Projection (SVP) lllhll . which is optimal up to constants. Recovery in this 
model is robust to noise, with only a small increase in number of measurements. The 
main disadvantage of this model is that the design requires 0 {dnin 2 ) storage space 
for A, which could be problematic for large matrices. Another possible disadvantage 
of this method is that measurements are dense - each measurement represents a linear 
combination of all O ( 711712 ) matrix entries, and the overall measurement sparsity of 
A{X) is also 0{dnin2), which could be problematic for large ni, 712 . 
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In the standard matrix completion problem we can recover X with high prob¬ 
ability from single entries chosen uniformly at random using nuclear norm minimiza¬ 
tion O, E. 20, ill 26] or using other methods such as SVD and gradient descent 


01 isll . This model has the lowest storage requirements (0{d)) and measurement 
sparsity (0{d)). However, recovery guarantees for this model are weaker: setting 
n = max(ni, 712 ), it is shown that Q(nrlog{n)) measurements are required to recover 


a rank r matrix of size ni x 77,2 H- In addition, unique recovery from this number of 
measurements requires additional incoherence conditions on the matrix X, and recov¬ 
ery of matrices which fail to satisfy such conditions (e.g. matrices with a few spikes) 
may require a much larger number of measurements. 


Recently a new design of rank one projections was proposed jEll, where each mea¬ 
surement is of the form a^XP and such that a £ have i.i.d standard 

Gaussian entries. It was proven that nuclear norm minimization can recover X with 
high probability in this design from 0{nir + n 2 r) measurements. This is the first 
model deviating from MC and GE we are aware of. This model is different from our 
row-and-column model, as each measurement is obtained by multiplying X from both 
sides, whereas in our model each measurement is obtained by multiplying X from 
either left or right. Moreover, in our model the measurements are not chosen indepen¬ 
dently from each other but come in groups of size ni or 712 (corresponding to rows or 
columns An advantage of rank one projection is that it leads to a signih- 

cance reduction in measurement storage needed for A with overall 0{dni + dn 2 ) stor¬ 
age space. However, each measurement is still dense and involve all matrix elements, 
hence measurement sparsity is 0{dnin2). In contrast, our GRC model requires only 
0{d) storage for A, and every measurement depends only on 0{n) elements, leading 
to a reduced overall time for all measurements 0{dni -f ^ 712 ). For RCMC, we need 
only storage for A, and measurement sparsity is 0{d). 


2 Preliminaries and Notations 

We denote by M„ixra 2 space of matrices of size tii x 712, by Onixn 2 the space 

(r) 

of matrices of size tii x 772 with orthonormal columns, and by ths space of 

matrices of size tii x 772 and rank ^ r. We denote 77 = max( 77 i, 772). 

We denote by 11 • | ji? the matrix Frobenius norm, by 11 • 11* the nuclear norm, and by 
II • 1 12 the spectral norm. For a vector, 11 • 11 denotes the standard I 2 norm. 

For X £ Kjijxn 2 W 6 denote by span{X) the subspace of spanned by the 
columns of X and dehne Px to be the orthogonal projection into span{X). 

For a matrix X we denote by Xi, the 7-th row, by X^j the j-th column and by 
Xij the {i,j) element. For two sets of indices /, J, we denote by Xij the sub-matrix 
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obtained by taking the rows with indices in / and columns with indices in J of X. 
We denote by [fc] the set of indices 1,fc. We denote by vec{X) the (column) vector 
obtained by stacking all the columns of X on top of each other. 

We use the notation X G to denote a random matrix X with i.i.d. entries 
X,, ^ G. 

For a rank-r matrix X G M.^nlxn 2 1®'- ^ ~ UXV'^ be the Singular Value Decom¬ 
position (SVD) where U G Omxr, V G Orxn-2 and S = diag{ai{X), ...,ar{X)) 
withCTi(X) > cr 2 {X) > .. > (JriX) > 0 the (non-zero) singular values of X (we omit 
the zero singular values and their corresponding vectors from the decomposition). For a 
general matrix X G xn2 we denote by the top-r singular value decomposition 
ofX, X(^r) = U,[r]'S[r][r]yJ[r]- 

Our model assumes two affine transformations applied to X, representing rows 
and columns, = XA^^'> and achieved by multiplications 

with two matrices A^^'> G and A^'^^ G respectively. We obtain 

noisy observations of these transformations, obtained by applying additive 

noise: 

-b ; XA^^'> + = B^^^ ( 1 ) 

where the total number of measurements is d = + n 2 k^^\ and G 

G R/j(c)xra2 '^wo zero-mean noise matrices. Our goal is to recover 
X from the observed measurements B^^^ and B^^\ To achieve this goal, we define 
the squared loss function 

XiX) = \ \A^^'>X - + \\XA^'^'> - B^'^'^Wl (2) 

and solve the least squares problem: 

Minimize T{X) s.t. X G Ai^nlxn 2 - (3) 

If ■ N{Q,t^) , minimizing the loss function in eq. (|2]l is equivalent to 

maximizing the log-likelihood of the data, giving a statistical motivation for the above 
score. Problem (|3ll is non-convex due to the non-convexrank constraint rank{X) < r. 

Our problem is a specialization of the general affine matrix recovery problem lE3] . 
in which a matrix is measured using a general affine transformation A with b = 
A{X) + z. We consider next and throughout the paper two specific random ensembles 
of measurement matrices: 

1. Row and Column Matrix Completion (RCMC); In this ensemble each row of 
A^^'> and each column of is a vector of the standard basis Cj for some j - 
thus each measurement b\^'^ or B^^^hs obtained from a single entry of X. We 
define a row-inclusion probability and column inclusion probability 
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such that each row (column) of X will be measured with probability 
More precisely, we dehne ri,i.i.d. Bernoulli variables, P(ri = 1) = p^^\ 
and include Ci as a row in if and only if = 1. Similarly, we dehne ci...c „2 
i.i.d. Bernoulli variables, P{ci = 1) = p^^\ and include as a column in 
if and only if Ci = 1. The expected number of observed rows (columns) is 
= n 2 P^^'^)- The model is very similar to the possibly more 
natural model of picking distinct rows and distinct columns at random 

for fixed k ^^'>, k^'^\ but allows for easier analysis. 

2. Gaussian Rows and Columns (GRC): In this ensemble , A^^"> N{0,1)- 

Each observation Bk ^ or B\^ ' is obtained by a weighted sum of a single row 
or column of X, with i.i.d. Gaussian weights. 

2.1 Comparison to Standard Designs 

Our proposed rows-and-columns design differs from standard designs appearing in the 
literature. It is instructive to compare our GRC ensemble to the Gaussian Ensemble 
(GE) fl, with the matrix representation A{X) = Avec(X) where A G M.dxnin 2 
A N(0, 1). Eor the latter, the following r-Restricted Isometry Property (RIP) can 
be used; 

Definition 1. ( r-RIP) Let A : xn 2 Q linear map. For every integer r with 

1 < r < min{ni,n 2 ), define the r-Restricted Isometry Constant to be the smallest 
number Cr such that 


(l-e.)||X||^<||.4(X)||<(l + e,)||X||^ 
holds for all matrices X of rank at jnost r. 


(4) 


The GE model satisfies the r-RIP condition for d = 0{rn) with high probability 
1 2211 . Based on this property it is known that nuclear norm minimization ISlgj] and 
other algorithms such as SVP Blfill can recover X with high probability. Unlike GE, in 
our GRC model A{X) doesn’t satisfy the r-RIP, and nuclear norm minimization fails. 
Instead, preserve matrix Erobenius norm in high probability - a weaker 

property which holds for any low-rank matrix, (see Lemma|7]in the Appendix). 

We next compare RCMC to the standard Matrix Completion model |0], in which 
single entries are chosen at random to be observed. Unlike GE, for MC incoherence 
conditions on X are required in order to guarantee unique recovery of X \ 


Definition 2. (Incoherence). Let U be a subspace o/R.” of dimension r, and Pjj be 
the orthogonal projection on U. Then the coherence ofU (with respect to the standard 
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basis {ci}) is defined as 


^l{U)=-max^\\Pu{e^ 

r 


(5) 


We say that a matrix X G xn 2 /i-incoherent if for the SVD X = UXV^ we 
have max{^{U), fi{V)) < /r. 

When X is /r-incoherent, and when known entries are sampled uniformly at random 
from X, several algorithms BHl 3 succeed to recover X with high probability. In 
particular, nuclear norm minimization has gained popularity as a solver for the standard 
MC problem because it provides recovery guarantees and a convenient representation 
as a convex optimization problem with availability of many iterative solvers for the 
problem. However, nuclear norm minimization fails for the RCMC design, even when 
the matrix X is incoherent, as shown by the next example: 


Example 1. Take X G Mnxnfor f G N with Xij = 1, G [n] x [n]. Thus 

||X||* = n. Take Set all unknown entries to O.ii, giving a matrix Xq 

of rank 2 with cti(Xo) = ct 2 {Xo) = Therefore ||^o||* = < 

11X11 * and nuclear norm minimization fails to recover the correct X. 

In Section[3we present our SVLS algorithm, which does not rely on nuclear-norm 
minimization. In Section]?] we show that SVLS successfully approximates X for the 
GRC ensemble. 


3 Algorithms for Recovery of X 

In this section we give an efficient algorithm which we call SVLS (Singular Value Least 
Squares). SVLS is very easy to implement - for simplicity, we start with AlgorithmlT] 
for the noiseless case and then present Algorithm]2](SVLS) which is applicable for the 
general (noisy) case. 

3.1 Noiseless Case 

In the noiseless case we reduce the optimization problem 0 to solving a system of 
linear equations B, and provide AlgorithmH] which often leads to a closed-form es¬ 
timator. We then give conditions under which with high probability, the closed-form 
solution is unique and is equal to the true matrix X. If rank{A^^'>U) = r one can 
write the resulting estimator X in closed-form as follows: 

X = UY = (6) 

AlgorithmlT] does not treat the row and column measurements symmetrically. We 
can apply the same algorithm, but changing the role of rows and columns. The resulting 
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Algorithm 1 

Input: and rank r 

1. Compute a basis (of size r) to the column space of B^^'> using Gaussian elimi¬ 
nation, represented as the columns of a matrix U G xr- 

2. Solve the linear system = A^^^UY,j for each j = 1 ,712 and write the 
solutions as a matrix Y — Y,i...Y,n 2 - 

3. Output X = UY 


closed form solution is then: 

X = (7) 

for an orthogonal matrix V representing a basis for the rows of X. Since the algorithm 
uses Gaussian elimination steps for solving systems of linear equations, it is crucial 
that we have exact noiseless measurements. Next, we modify the algorithm to work 
also for noisy measurements. 

3.2 General (Noisy) Case 

In the noisy case we seek a matrix X minimizing the loss B in eq. (|2]i. The minimiza¬ 
tion problem is non-convex and there is no known algorithm with optimality guaran¬ 
tees. We propose Algorithm|2](SVLS), which empirically returns a matrix estimator X 
with a low value of the loss B. In addition, we prove in Section|4]recovery guarantees 
on the performance of SVLS. 
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Algorithm 2 SVLS 


Input: and rank r 

1. Compute U, the r largest left singular vectors of B^^\ (U is a basis for the 
columns space of 


2. Find the least-squares solution 


Y = argminy || B^^^ - A’'^^UY\\f. 


( 8 ) 


If rank{A^^'>U) = r we can write Y in closed form as before: 


(9) 


3. Compute the estimate = UY. 

4. Repeat steps 1-3, replacing the roles of columns and rows to get an estimate 




5. Set X = argmitij^^R) ^(o J'(Ar), for the loss X{X) given in eq. (|2]i. 


3.2.1 Gradient Descent 

The estimator X returned by SVLS may not minimize the loss function X in eq. (|2]i. 
We therefore perform an additional gradient descent stage starting from X to achieve an 
estimator with lower loss (while still possibly only a local minimum since the problem 
is non-convex). SVLS can be thus viewed as a fast method for providing a desirable 
starting point for local-search algorithms. The details of the gradient descent are given 
in the Appendix, Section lL^ 

3.3 Estimation of Unknown Rank 

In real life problems, one doesn’t know the true rank of a matrix and should estimate 
it from data. Our rows-and-columns sampling design is particularly suitable for rank 
estimation since rank(B^^’^^) = rank{B^^’^^) = rank{X) with high probability 
when enough rows and columns are sampled. In the noiseless case we can estimate 
rank{X) by f=rank{B^^’^'>) or rank{B^^'^'>). 

For the noisy case we estimate rank{X) from B^^\ B^^\ We use the popular 
elbow method to estimate rank{B^^'>) in the following way 



( 10 ) 
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We compute similarly f^^'> from B^^'> and take the average as our rank estimator, 
f = round We demonstrate the performance of our rank estimation 

using simulations in the Appendix, Section lT^ 

Modern methods for rank estimation from singular values lisll can be similarly 
applied to , B^^^ and may yield more accurate rank estimates. After we estimate 
the rank, we can plug-in f as the rank parameter in the SVLS algorithm and recover 
X. 


3.4 Low Rank Approximation 


In the low rank matrix approximation problem, the goal is to approximate a (possi¬ 
bly full rank) matrix X by the closest (in Frobenius norm) rank-r matrix By 

the Eckart-Young Theorem 0, this problem has a closed-form solution which is the 
truncated SVD of X. SVD is a powerful tool in affine matrix recovery and different 
algorithms such as SVT, OptSpace , SVP and others apply SVD. In IlSf the authors 
try to find a low rank approximation to X using measurements XA^‘^'> = B^^'> and 
A^^'>X = B^^^K For large ni,n 2 they give a single-pass algorithm which computes 
X(^r) using only B^^^ and B^^\ We bring their algorithm in the Appendix, Section 
17.61 The main difference between the above formulation and our problem in eq. Q 
is the rank estimation. In OlSll it is assumed that = k and one estimates 


instead of a rank-r matrix which can lead to poor performance if r <C fc. We ad¬ 
justed the algorithm presented in [1 151] to our problem and give a new estimator which 
is a combination of SVLS and jlSIl ’s method, replacing X^^'> and X^^'> in steps 3,4 of 
SVLS by: 


j^{R) ^ x(ii)vvT ^ ^(C) ^ IJljTx(C)^ 

Here V is the r largest right singular vectors of B^^'> and U is the r largest left 
singular vectors of B^^\ We call this new estimator SVLSp. Simulations show al¬ 
most identical and in some cases slightly better performance of this modihed algorithm 
compared to SVLS (see Appendix, Section ItTSI i. This modihed estimator is however 
difficult to analyze rigorously, and therefore we present throughout the paper our results 
for the SVLS estimator. 


4 Performance Guarantees 

In this section we give guarantees on the accuracy of the estimator X returned by SVLS 
. Our guarantees are probabilistic, with respect to randomizing the design matrices 
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For the noiseless case we give conditions which are close to optimal for 
exact recovery. 

4.1 Noiseless Case 

A rank r matrix of size ni x 712 has r{ni + 712 — r) degrees of freedom, and can 
therefore not be uniquely recovered by fewer measurements. Setting = r 

gives precisely this minimal number of measurements. We next show that this number 
suffices, with probability 1, to guarantee accurate recovery of X in the GRC model. 
In the RCMC model the number of measurements is increased by a logarithmic factor 
in n and we need an additional incoherence assumption on X in order to guarantee 
accurate recovery with high probability. We first present two Lemmas which will be 
useful. Their proofs are given in the Appendix, Section lTTl 

Lemma 1. Let Xi,X 2 G and G G xfc(c) such 

thatrank{A^^'>Xi) = rank{XiA^^'>) = r. IfA^^'>Xi = A^^'iX 2 and XiA^^"> = 
X 2 A^^'> then Xi = X 2 . 

Lemma 2. Let X G and A^^^ G G R„ 2 xfe(c) such that 

rank{A^^'>X) = rank{X = r. For Algorithm\J}with inputs A^^\ A^^\ 

^(GjO) j, output X satisfies 

A^^'>X = A^^^X, XA^^^ = XA^^"> (12) 

4.1.1 Exact Recovery for GRC 

For the noiseless case, we can recover X with the minimal number of measurements, 
as shown in Theorem[T] (proof given in the Appendix, Section lTTl ): 

Theorem 1. Let X be the output ofAlgorithm\I\in the GRC model with = 0 

and k^^\ k^^'> > r. Then P{X = X) = 1. 

4.1.2 Exact Recovery for RCMC 

In the RCMC model, rows and columns of X are sampled with replacement. Since the 
same row can be sampled over and over, we cannot guarantee uniqueness of solution, 
as was the case for the GRC model, but rather wish to prove that exact recovery of X 
is possible with high probability. We assume the Bernoulli rows and columns model as 
described in Section|2]and assume for simplicity that = k^^^ = k. 

Theorem 2. Let X = U'EV'^ be the SVDofX G Knixn 2 > and uiax{p{U), p.{V)) < 
fi. Take A'^^^ and A^’^^ as in the RCMC model without noise and probabilities p^^'l = 
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and Let (3 > 1 such that < 1 where Cr is uni¬ 
form constant and let X be the output of Algorithm\I\ Then P > 1 — 

6min(ni, 712 )“^. 

The proof of Theorem|2]is in the Appendix, Section l73] 

Remark 1. Both row and column measurements are need in order to guarantee unique 
recovery. If for example, we observe only rows then even with n — 1 observed rows 
and rank r = 1 we can only determine the unobserved row up to a constant, and thus 
cannot recover X uniquely. 


4.2 General (Noisy) Case 

In the noisy case we cannot guarantee exact recovery of X, and our goal is to minimize 
the error 11 Ai — JC11 i? for X the output of S VLS. Here, we give bounds on the error for 
the GRC model. For simplicity, we show the result for = k. 

We focus on the high dimensional case k < n, where the number of measurements 
is low. In this case our bound is similar to the bound of the Gaussian Ensemble (GE). 
In it is shown for GE that | |X — X| holds with high probability for 

some constant Cq- We next give an analogous result for our GRC model (proof in the 
Appendix, Section lTAl . 

Theorem 3. Let and with k > max{Ar, 40) be as in the GRC model with 
noise matrices Let X be the output of SVLS. Then there exist constants 

c, c^^\ such that with probability > 1 — 





(^)|| Z (^)|| 2 + C («)||^(«)||2 


(13) 


Theorem[3applies for any wd Z^^\ If fc < n?indZ^^\Z^^'> 
then from eq. we get max(||Zl^l|| 2 , || 2 ' 1 ‘" 1 || 2 ) < dr^/n with probability 1 — 
g-2n therefore get the next Corollary for i.i.d. additive Gaussian noise: 


Corollary 1. Let A^^\ as in the GRC with n > k > max(4r,40), model and 
Z^^\ * ■ W(0, T^). Then there exist constants c, Cgrc such that: 


p(\\x 


— < Cgrc 



> 1 - - e-2". 


(14) 
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5 Simulations Results 


We studied the performance of our algorithm using simulations. We measured the 
reconstruction accuracy using the Relative Root-Mean-Squared-Error de¬ 

fined as 

RRMSE = RRMSE{X,X) = ||X - X||f/||^||f. (15) 

For simplicity, we concentrated on square matrices with ni = n2 = n and used 
an equal number of row and column measurements, = k . In all 

simulations we sampled a random rank-r matrix X = UV'^ with U,V G IR^xr , 
U, V • N(0, a^). 

In all simulations we assumed that rank{X) is unknown and estimated using the 
elbow method in eq. (fTOl l. 


5.1 Row-Column Matrix Completion (RCMC) 


In the noiseless case we compared our design to standard MC. We compared the re¬ 
construction rate (probability of exact recovery of X as function of the number of 
measurements d) for the RCMC design with SVLS to the reconstruction rate for the 
standard MC design with the OptSpace |18| and SVT||2l algorithms. To allow for nu¬ 
merical errors, for each simulation yielding X and X we defined recovery as successful 
if their RRMSE was lower than 10“^, and for each value of d recorded the percent¬ 
age of simulations for which recovery was successful. In Figure [T| we show results for 
n = 150, r = 3 and cr = 1. SVLS recovers X with probability 1 with the optimal 
number of measurements d = r{2n — r) = 894 yielding k, 0.04 while MC with 
OptSpace and SVT need roughly 3-fold and 8-fold more measurements, respectively, 
to guarantee exact recovery. 
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Figure 1; Reconstruction rates for matrices with dimension n = 150 and r = 3 where 
d is the number of known entries varied between 0 to 8000. SVT and OptSpace are 
applied to the standard MC design and Algorithm [T] to RCMC. For each d we sampled 
50 matrices and calculated the reconstruction rate as described in the main text. 


The improvement in accuracy is not due to our design or our algorithm alone, but 
due to their combination. We compared our method to OptSpace and SVT for RCMC. 
We sampled a matrix X with n = 100, r = 3, cr = 1 and noise level = 0.25^, and 
varied the number of row and column measurements k. Figure |2] shows that while the 
performance of SVLS is very stable even for small k, the performance of OptSpace 
varies, with multiple instances achieving poor accuracy, and SVT which minimizes the 
nuclear norm achieves poor accuracy for all problem instances. 

Remark 2. The OptSpace algorithm has a trimming step which delete dense columns. 
We omitted this step in the RCMC model since it would delete all the known columns 
and rows and it’s not stable for this type of measurements, but it still get better result 
than SVT. 
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Figure 2: Box-plots represent the distribution of RRMSE as a function of the number 
of column and row measurements k over 50 different sampled matrices X = UV"^ 
with U,V iV(0,1) and 7V(0,0.252). OptSpace (red) fails to 

recover X on many instances while SVLS (blue) performs very well on all of them. 
SVT (black) fails to recover X for all instances. The trimming of dense rows and 
columns in OptSpace was skipped, since such trimming in our settings may delete all 
measurement information for low k. 

Next, we compared our RCMC to standard MC. We sampled X as before with 
U,V G Mioooxr with standard Gaussian distribution, different rank and different noise 
ratio. The observations were corrupted by additive Gaussian noise Z with relative noise 

leyelNR=\\Z\\F/\\X\\F. 

Results, displayed in Table [T] show that SVLS is significantly faster than the other 
two algorithms. It is also more accurate than MC for small number of measurements, 
and comparable to MC for large number of measurements. 

Finally, we checked for RCMC and MC our performance only on unobserved en¬ 
tries, to examine if RRMSE is optimistic due to overfitting to observed entries. Re¬ 
sults, shown in the Appendix, Section lT?^ indicate than no overfitting is observed. 
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NR 

d 

r 

SVLS 

OptSpace 

SVT 

10"^ 

120156 

10 

0.0063 (0.15) 

0.004 (20.8) 

0.0073 (18.7) 

10-1 

120156 

10 

0.064 (0.15) 

0.044 (21.7) 

0.05 (11) 

1 

120156 

10 

0.612 (0.16) 

0.49 (24.5) 

0.51 (1) 

10"^ 

59100 

20 

0.029 (0.12) 

0.97 (25.6) 

0.76 (4.4) 

10-1 

59100 

20 

0.3 (0.12) 

0.98 (40.1) 

0.86 (6.5) 

10-1 

391600 

50 

0.081 (0.7) 

0.05 (1200) 

0.069 (13) 

1 

391600 

50 

0.72 (0.6) 

0.61 (1300) 

0.59 (5) 


Table 1: RRMSE and time in seconds (in parenthesis) for SVLS applied to RCMC, 
and OptSpace and SVT applied to the standard MC. Results represent average of 5 
different random matrices. SVLS is faster than OptSpace and SVT by 1 to 3 orders of 
magnitudes, and shows comparable or better RRMSE in all cases. 


5.2 Gaussian Rows and Columns (GRC) 

We tested the performance of the GRC model with N{0,^) and with 

X = UV'^ where U,V N{0, We compare our results to the Gaussian 
Ensemble model (GE) where for each n, A{X) was normalized to allow a fair com¬ 
parison. In Figure |3] we take n = 100 and r = 2, and change the number of measure¬ 
ments d = 2nk (where A^^'> G Mfcxn and A^^^ € K„xfc)- We added Gaussian noise 
with different noise levels t. For all noise levels, the performance of GRC 
was better than the performance of GE. The RRMSE error decays at a rate of ^/k. 
For GE we used the APGL algorithm lE^l for nuclear norm minimization. 
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Figure 3: RRMSE as function of d, the number of measurements, where we take X S 
( 2 ) 

■^100x100’ ^ varied from 400 to 4000 and for different noise levels: t = 0.1, 0.01 
and 0.001. For every point we simulated 5 random matrices and computed the average 
RRMSE. 

In the next tests we ran SVLS for measurements with different noise levels. We 
take n = 1000 and k = 100 with different rank level every entry in 
N{0, T^) and different values of t. Results are shown in Figured The change in the 
relative error RRMSE is linear in t while the rate depends on r. 

We next examined the behaviour of the RRMSE when n ^ oo and when n,k,r 
oo together, while the ratios ^ and ^ are kept constant. Results (shown in the Ap¬ 
pendix, Section 1731 1 indicate that when properly scaled, the RRMSE error is not 
sensitive to the value of n and other parameters, in agreement with Theorem[3 

6 Discussion 

We introduced a new measurements ensemble for low rank matrix recovery where 
every measurements is an affine combination of a row or column of X. We focused 
on two models: matrix completion from single columns and rows (RCMC) and matrix 
recovery from Gaussian combination of columns and rows (GRC). We proposed a fast 
algorithm for this ensemble. For the RCMC model we proved that in the noiseless 
case our method recovers X with high probability and simulation results show that the 
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RCMC model outperforms the standard approach for matrix completion in both speed 
and accuracy for models with small noise. 



Figure 4; RRMSE as a function of noise level r varied from 0 to 0.1, for matrices 
X £ Rioooxiooo of different ranks. For each curve we htted a linear regression line, 
with htted slopes 0.145,0.208,0.25,0.3 for r = 2,4,6, 8, respectively. The slope is 
roughly proportional to -Jr in concordance with the error bound in Theorem[3] Further 
investigation of the relation using extensive simulations is required in order to evaluate 
the dependency of the recovery error in r in a more precise manner. 

For the GRC model we proved that our method recovers X with the optimal number 
of measurements in the noiseless case and gave an upper bounds on the error for the 
noisy case. For RCMC, our simulations show that the RCMC design may achieve 
comparable or favorable results, compared to the standard MC design, especially for 
low noise level. Proving recovery guarantees for this RCMC model is an interesting 
future challenge. 

Our proposed measurement scheme is not restricted to recovery of low-rank ma¬ 
trices. One can employ this measurement scheme and recover X by minimizing other 
matrix norms. This direction can lead to new algorithms that may improve matrix 
recovery for real datasets. 
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7 Appendix 


7.1 Proofs for Noiseless GRC Case 
Proof of Lemma [1] 

Proof. First, rank{X2A^^^) = rank{XiA^^'>) = r and similarly rank{A^^^Xf) = 
rank{A'^^^Xi) = r. Since span(Xiy4(‘^)), span(X2^*'‘^^) are subspaces of span(Xi), 
span{X2) respectively, and dim{span{X2)) < r we get span{X2) = span{X2A^^^) 
= span{XiA^^'>) = span{Xi), and we define U € Omxr a basis for this sub¬ 
space. For Xi,X2 there are 11,12 € Mrxn2 that Xi = UYi,X2 = UY2. 
Therefore A^^^UYi = A^^'>UY2. Since rank{A^^'>UYi) = r and U G Omxr we 
get rank(A^^'>U) = r, hence the matrix U'^A^^'> A^^'>U is invertible, which gives 
Yi = I2, and therefore Xi = UYi — UY2 = X2. ■ 


Proof of Lemma |2] 

Proof. span{XA^^'>) C span{X) and rank{XA^^'>) = rank{X) = r, therefore 
span{XA^^^) = span{X) and U from stage 1 in Algorithm[T]is a basis for span{X). 
We can write X = UY for some matrix Y G Rrxn 2 - Since rank{A^^WY) = 
rank{U) = r, we have rank{A^^'>U) = r. Thus eq. ® gives X in closed form and 
we get: 

A^^'fUY = A^^'iX. (16) 

UYA^^"i = XA^^\ (17) 

■ 

Lemma 3. Lefl^ G Onxr and A^'^^ G ^nxk be a random matrix 7V(0, tr^). 

Then 24 (C) '- ff- P). 

Proof. For any two matrices A G Rm xrt2 and B G Rmi xm2 we define their Kronecker 
product as a matrix in Mnimixn 2 m 2 - 
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^ Qll-S CL12B 


^ln2^ ^ 


A®B = 


(18) 





Now, we have = (/„ 0 y^)t!ec(A(‘^() and since vec{A^‘^^) ^ 

N{0, aln) the vector (J„ 0 V'^)vec{A^‘^'>) is also a multivariate Gaussian vector with 
zero mean and covariance matrix: 


COv{v^A^^^^ =COv({ InV^)vec{A^'^''^)'^ = 
{In ® V^)COV (uec(A(^))) {In ® v^f = 
Cr‘^{In ® V'^){In ® ® In = CT^/nr- 


(19) 


Proof of Theorem [T] 

For the GRC model. Lemmas I ll2l andl^can be used to prove exact recovery of X with 
the minimal possible number of measurements: 

Proof. Let UYA/'^ be the SVD of X. From Lemma [3] the elements of the matrix 
a continuous Gaussian distribution and since the measure of low rank 
matrices is zero and k^^'> > r we get that P{rank{V"^A^^^) = r) = 1. Since 
B(C) = we get P{rank{B^^y) = rank{UY.V'^A^^^) = r) = 1. In 

the same way P{rank{B^^^) = r) = 1. Combining Lemma|2]with Lemma[T]give us 
the required result. ■ 

7.2 Gradient Descent 

The gradient descent stage is performed directly in the space of rank r matrices, using 
the decomposition X=WS where W S Kmxr and S G Krxn 2 and computing the 
gradient of the loss as a function of W and S, 


C{W,S) =P{WS) = WA^^^WS - + \\WSA<^^^ - (20) 

We want to minimize eq. (l20l l but the loss C isn’t convex and therefore gradient 


descent may fail to converge to a global optimum. We propose X (the output of SVLS 
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) as a starting point which may be close enough to enable gradient descent to converge 
to the global optimum, and in addition may accelerate convergence. 

The gradient of C is (using the chain rule) 


dC 


= 2 


+ {WSA^^^ - 


^=2 

dS 


W^A^^'>'^{A^^'>WS - B^^'>) + W^{WSA^^'> - B^^'>)A^^'>'‘ 


( 21 ) 


7.3 Proofs for Noiseless RCMC Case 

We prove that if 17 G Omxr is orthonormal then with high probability we have 
p-^ ^(.R)u _ pl ^\\2 < 1. Because U is orthonormal, this is equivalent 

to 


p-l\\UU^A^^^^A^^^UU^ -pUU^\\2 <1 ^p-^\\PuPa^h)tPu-pPu\\2 < 1 

( 22 ) 

where Pjj = UU"’", P^^r^t = andp^^i = p. We generalize Theorem 

4.1 from iQ]. 

Lemma 4. Suppose A^^'^ as in the RCMC model with inclusion probability p, and 
U G Onixr with p{U) = ^maXi\\Pu{ei)\\'^ = p. Then there is a numerical constant 
Cr such that for all /3 > 1, if C<- 1 then: 

P I^-^\\PuPaw^Pu-pPu\\2 < > 1 - (23) 

The proof of Lemma|4]builds upon (yet generalizes) the proof of Theorem 4.1 from 
10]. We next present a few lemmas which are required for the proof of Lemma |4] We 
start with a lemma from iItII . 


Lemma 5. If pi is a family of vectors in and Vi is a sequence of i.i.d. Bernoulli 
random variables with P{ri = 1) = p, then 


E{p ^\\S:^{n-p)yi(^yi\\) <Ci 


'log{d) 


-maXi 


\\yi\ 


( 24 ) 


for some numerical constant C provided that the right hand side is less than 1. 


We next use a result from large deviations theory 02511 : 
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Theorem 4. Let Yi...Yn be a sequence of independent random variables taking values 
in a Banach space and define 


Z = supf^FY.f{Yi) (25) 

i=l 

where F is a real countable set of functions such that if f G F then —fGF. 

Assume that \f\<B and E{f(Yi)) = Ofor every f G F and i G [n]. Then there 
exists a constant C such that for every i > 0 

P{\Z - E{Z)\ >t) < 3exp (26) 

where a = supf^p E{P{Yi)). 

Theorem|4]is used in the proof of the next lemma which is taken from Theorem 4.2 
in 10]. We bring here the lemma and proof in our notations for convenience. 

Lemma 6. Let U G Onxr with incoherence constant p. Let Vi be i.i.d. Bernoulli 
random variables with P{ri = 1) = p and let Yi = p~^(ri — p)Pu{ei) ® Ppiei) for 
i = Let Y = and Z = Suppose i?(Z) < 1. Then for every 

A > 0 we have 

P{IZ-E{Z)\ > < 3exp(-7,nm(A>iog(n).Ay5*^)) (27) 

for some positive constant 7. 

Proof We know that Z = ||y ||2 = F/ 2 ) = supf,j^J2'^^^{fiY,f2), 

where the supremum is taken over a countable set of unit vectors /i ,/2 G Fy- Let 

F be the set of all functions / such that f(Y) = {fi,Yf 2 ) for some unit vectors 
/ij /2 G Fy. For every f G F and i G [n] we have E{f(Yi)) = 0. From the 
incoherence of U we conclude that 

\f{Yi)\ =p-^\r,-p\ X |(/i,Pj/(e,))| X |(Pc/(eO,/ 2 )| < p~^\\Pu{e^)\\'^ < p~^-p. 

n 

(28) 

In addition 


E{f{Y,)) = p-\l - p){h,Pu{e,)f{Pu(.e,),f2)^ < 
p-^\\Puiem{Puie,)J2f\<p-^^B\{Pu{e^)J2)\^■ (29) 

Since Er=iK^t/(e7),/2)P = Pu{f 2 W = \\Pu{f 2 W < h we get 

j:7=iE{p{Y))<p-^^p. 
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We can take B = 2p and t = and from TheoremlH 

P{\Z - E{Z)\ >t)< iexp < ^exp 

(30) 

where the last inequality is due to the fact that for every rt > 0 we have log{l + u) > 
log(2)min{l,u). Taking 7 = —log(2)/K finishes our proof. ■ 

We are now ready to prove Lemma|4] 

Proof. (Lemma |4]i Represent any vector w G in the standard basis as w = 
J2'^li{w,ei)e^. Therefore Pj/(w) = J2'^ll{Puiw),e^)ei = Y.'ifi{w,Pu{ei))e^. Re¬ 
call the Vi Bernoulli variables which determine if ei is included as a row of A^^'> as in 
Section|2]and define Yi and Z as in Lemma|6] We get 


Pa(R)tPu{w) ='^n{w,Pu{e^))ei 

m 

PuPa(ryPu{w) = ri{w, Pu{ei))Pu{ei) 


In other words the matrix PuP ^^ r^t Pjj is given by 

ni 

PuPa(rYP u = ''^'>'iPu{ei) ® Pui^i) 

i=l 


(31) 


(32) 


U is ^—incoherent, thus maXi^[n^]\\Pu{ei)\\ < 
for p large enough: 


, hence from Lemma |5] we have 


Eip-^\\PuP^^^,TPu-pPu\\2)<cJ^-^^^^^< 1. (33) 


pni 


For P > 1 which satisfy the lemma’s requirement, take X = \ - where 7 as in 


Theorem |4] We get that if p > 

least 1 — we have Z < C 
finishes our proof. 


IJ,log(ni)r/3 

nil 

log(ni)rfj. 

pni 


then from Lemma |6] with probability of at 


1 


^2i^^.T^kmgCn = C + ^ 


Proof of Theorem |2] 

Proof. From Lemma|4]and using a union bound we have that with probability > 1 — 

6min(ni, 712 )“^, A^^'>U \\2 < 1 and p^*^) ^\\p^^'>Ir — 
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^r^(C)^(C)'^y||^ < 1. Since the singular values of — U'^A^^'>U are 

- a^{U^A(-^^U) \ for 1 < i < r, we have 

p(R) _ ariU^A^^^'A^^^U) < ai(p(«)4 - 

^0<ar{U'^A(^^^A^^'>U) ( 34 ) 

and similarly for 14 ^ V. Themfoie rank{A^^^U) = rankiV'^A^^^) = 
r and rank{A^^^ X) = rank{XA^^'>) = r with probability > 1 — 6mm(ni,n2) 
From Lemma| 2 ]we get A^^'^X = A^^'>X XA^^'^ = XA^^'^ and from Lemma[T]we 

get X = X with probability > 1 — 6mw(ni, 712 )“^. ■ 


7.4 Proofs for Noisy GRC Case 


The proof of Theorem[3is using strong concentration results on the largest and smallest 
singular values of n x k matrix with i.i.d Gaussian entries; 


Theorems. 4241/ Let A G Rnxfe ke a random matrix A 
and smallest singular values obey: 


i.i.d. 


N(0, —). Then, its largest 


P 


(uM) > l + 

V s/n J 


P(afc(2l)< (35) 

Corollary 2. Let A G M„xfe be a random matrix A iV(0, 1) where n > 4fc, and 
let A’’^ be the Moore-Penrose pseudoinverse of A. Then 


2 < 4=) > 1 - (36) 

VnJ 

Proof. Since A'^ is the pseudoinverse of A, \ |A'l' \\ 2 =-^^^ and from Theoremj^we get 
ah (^) > s/n — Vk — t./n with probability > 1 _ e"*V2 

(notice the scaling by .fn of 

the entries of A compared to Theorem|5ll. Therefore, if we take n > Ak and f = i we 
get 

P (^||2lt||2 <^)=P (<^k{A) > > 1 - (37) 



We also use the following lemma from ll23[] : 
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Lemma 7. Let Q to be a finite set of vectors in R", let 8 G (0, 1) and k be an integer 
such that _ 


e = 


Qlog{2\Q\/5) 


< 3. 


(38) 


Let A G Rfexn be a random matrix with A 


N{0, i). Then, 


P I maXx^Q 




- 1 


< e > 1 — <5. 


(39) 


Lemma [T] is a direct result of the Johnson-Lindenstrauss lemma 111 ih applied to 
each vector in Q and using the union bound. Representing the vectors in Q as a matrix, 
Lemma|7]shows that A^^'>, A^^^ preserve matrix Frobenius norm with high probability 
- a weaker property than the RIP which holds for any low-rank matrix. 

To prove Theorem[2 we first represent | |X — X| as a sum three parts (Lemmaj^i, 
then give probabilistic upper bounds to each of the parts and finally use union bound. 
We define = A^^^U and = V'^A^'^\ From Lemma |3] 

A^(0,1), hence rank{A'^ ) = rank{Ayf) = r with probability 1. We assume 
w.l.o.g that X = X^^'> (see SVLS description). Therefore, from eq. (|9l) we have 

We denote by and ^ {A^y^A^y^ )-^ 

the Moore-Penrose pseudoinverse of ' and Ayf, respectively. We next prove the 
following lemma 


Lemma 8. Let A^^'^ and A^'^^ be as in the GRC model and be noise ma¬ 

trices. Let X be the output of SVLS. Then: 


\\X-X\\f < i + ii + iii 

where: 


I=||(B(C.0)_^(C))^(Cjt||^ 

(40) 

II ^\UAf'^^ 

(41) 


(42) 
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Proof. We represent ||X — X||i? as follows 

||X - + Z^^^)\\f = 

||X - UA^^^X - < 

\\X-UAf^'^A^^^XWF+llI ( 43 ) 


where we have used the triangle inequality. We next use the following equality 

XA'^^^A^y^W^ = = X (44) 

to obtain: 

\\X -UAf'>'’A^^'^X\\F = 
m-UAf'>U(^^)X\\F = 

||(/„ - A^^'>)XA<~^'>A^^fv^\\F = 

||(/„ - (45) 

where the last equality is true because V is orthogonal. 

Since (7 is a basis for span{B^^.^) there exists a matrix Y such that UY = 
and we get: 

(In - - UAf'>^A<^^^UY = - UY = 0. (46) 

Therefore 

||(/„ - UAf^U^^^)B^^’^^A^^f\\F = 

||(/„ - (77ljf’^7l(«))(i3(^’0) - < 

||(ij(C.O) _ 5(C))^(C)t||^ ^- i?[^^Vy/llF = I + II 

(47) 

Combining eq. (l4Tl i and eq. (l47l i gives the required result. ■ 

We next bound each of the three parts in the formula of Lemma 0 We use the 
following claim: 

Claim I. \\b(CA) _ < 2||Z(C)||2 
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Proof. We know that — B ^^^\\2 < \\B^^'^ — since rank{B^^^) = 

rank{B^^'^^) = r with probability 1, and by debnition is the closest rank-r 
matrix to B^^^ in Frobenius norm. Therefore from the triangle inequality 

||(S(C.O) _ < ||5(C) _ ^(C)||^ ^ II^(C) _ s(C.0)||^ < 

2||5(C.0) _ 5(C)||2 = 2||Z(‘^)||2. (48) 


Now we are ready to prove Theorem |3] The proof uses the following inequalities 
for matrix norms for any two matrices A, B: 

\\ab\\2<\\amb\\2 

\\AB\\f<\\A\\f\\B\\2 

rank{A) < r ||A| 1^ < •yr||A|| 2 . (49) 

Proof. (Theorem|2l We prove (probabilistic) upper bounds on the three terms appear¬ 
ing in Lemma[8] 

1. We have 

rank ^ rank ^ r. (50) 

Therefore 

I = (51) 

Since A^(0,1), from Corollary we get > 

1 — for k > 4r, hence with probability > 1 — 

-B^^^)\\2. (52) 

From Claim [T] and eq. ( l40b we get a bound on I for some absolute constants 

Cl, ci: 

P(l < > 1 - (53) 

2. U is orthogonal and can be omitted from II without changing the norm. Apply¬ 
ing the second inequality in eq. (l49l l twice, we get the inequality: 

II < 

||A^''’^||2P(«)(B(C0) _ B[^^^)\\F\\A^^f\\2. (54) 
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From Corollary |2] we know that for fc > 4r we have and 

each with probability > 1 — Therefore, 

P(ll < > 1 - 2e-'=/^®. (55) 

and are independent and ranA:(_B(‘^’°^— < 2r. There¬ 

fore we can apply Lemma|7]with k such that | > log{2k) + (this holds for 
k > 40) to get with probability > 1 — 

II< ^p(fl)(B(C.O) _^(C)^||^ < 

< 36^4^11(56) 

From eq. (1551 ) and (l56b together with Claim[T]we have constants C 2 and C 2 such 
that, 

P(ll < C2||^(‘=')||2) > 1 - (57) 

3. rank{A^^'^^) < r and from Corollary |2]we get P^||>1^^^||2 < > 1 — 

g-fe /18 fQj. ^ ^ qj, xherefore, with probability > 1 — 


III =\\UAf'>'‘= WAf^"'Z^^'^W f < 


(58) 


Hence we have constants Ca and C 3 such that, > 1 — e 

P(lll < > 1 - (59) 

Combining equations (I53I57I591 I with Lemma | 8 ] and taking the union bound while 
setting = Ci+ C 2 , = C 3 with c = min{ci, 02 , 03 ) concludes our proof. ■ 
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7.5 Simulations for Large Values of n 

We varied n between 10 and 1000, with results averaged over 100 different matrices 
of rank 3 at each point, and tried to recover them using fc = 20 row and column 
measurements. Measurement matrices were i to allow similar norms 

for each measurement vector for different values of n. Recovery performance was 
insensitive to n. if we take 1V(0,1) instead of 1V(0, i), the scaling of 

our results is in agreement with Theorem^ 



Figure 5; Reconstruction error for n x n matrix where n is varied between 10 and 
1000, A: = 20 and r = 3 and two different noise levels: r = 0.1 (blue) and t = 0.01 
(red). Each point represents average performance over 100 random matrices. 

Next, we take n,k,r ^ oo while the ratios j = 5 and ^ = 4 are kept constant, and 
compute the relative error for different noise level. Again, the relative error converges 
rapidly to constant, independent of n,k,r . 
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Figure 6: Reconstruction error for n x n matrix X with rank r varying from 1 to 50 
and with n = 20r, k = 4r. Two different noise level are shown; t — 0.1 (blue) and 
r = 0.01 (red). Each point represents average performance over 100 random matrices. 


7.6 Low Rank matrix Approximation 

We bring here the one pass algorithm to approximate X from ll^ for the convenience 
of the reader. The output of this algorithm isn’t low rank if fc > r. This algorithm is 
different from SVLSp and its purpose is to approximate a (possibly full rank) matrix 
by low rank matrix. We adjusted AlgorithmOto our purpose with some changes. First, 
we estimate the rank of X using the elbow method from Section 13.31 and instead of 
calculating the QR decomposition of and we find their f largest singular 

vectors. Furthermore, we repeat part two in algorithm [3 while replacing the roles of 
columns and rows as in SVFS . This variation gives our modified algorithm SVLSp 
as described in Section iTAl 

We compared our SVFS to SVLSp which is presented in Section [TAI We took 
X € AlJoooxiooo cr = 1. We tried to recover X in the GRC model with fc = 12 
for 100 different matrices. For each matrix, we compared the RRMSE obtained for 
the outputs of SVFS and SVLSp. The RRMSE for SVLSp was lower than the 
RRMSE for SVFS in most cases but the differences were very small and negligible. 
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Algorithm 3 

Input: 

1. compute the QR decomposition of and the QR de¬ 

composition of 

2. Find the least-squares solution F = argmincWQ^'^'^B^^'> ||i?. 

3. Return the estimate X = . 



Figure 7: We recover a matrix X from 24000 measurements as in the GRC model 100 
times. Figure shows average RRXISE over 100 simulations for SVLS {Y axis) and 
SVLSp (X axis). The red linear line Y = X was drawn for comparing those two 
algorithm, every dot that under the red line is a simulation that SVLS was better than 
SVLSp and every dot above the line tells the opposite 

7.7 Rank Estimation 

We test the elbow method for estimating the rank of X (see eq. (fTOl l'). We take a 
matrix X of size 400 x 400 and different ranks. We add Gaussian noise with a = 0.25 
while the measurements are sampled as in the RCMC model. For each number of 
measurements we sampled 100 matrices and took the average estimated rank. We 
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compute the estimator f for different values of d, the number of measurements. We 
compare our method to the rank estimation which appears in OptSpace lll7|] for the 
standard MC problem. Our simulation results, shown in Figure [8] indicate that the 
RCMC model with the elbow method is a much better design for rank estimation of X. 



Figure 8: Estimation of rank{X) vs. d, the number of measurements, d = k(2n — k) 
where k is the number of columns in and number of rows in For each d 

we sampled 100 different matrices. Estimation was performed by the elbow method 
for RCMC model, as in eq. (fTol) in the main text, and for the MC model we used the 
method described in lll7|] . RCMC recovers the correct rank with smaller number of 
measurements. 


7.8 Test Error 

In matrix completion with MC and RCMC ensembles the RRMSE loss function mea¬ 
sures the loss on both the observed and unobserved entries. This loss may be too opti¬ 
mistic when considering our prediction error only on unobserved entries. Thus, instead 
of including all measurements in calculation of the RRMSE we compute a different 
measure of prediction error, given by the RRMSE only on the unobserved entries. 
Eor each single-entry measurements operator A dehne E{A) the set of measured en¬ 
tries and E it’s complement, i.e. the set of unmeasured entries (i, j) G [ni] x [ 712 ]. We 
dehne X^ to be a matrix such that X® = Xij if (z, j) G E and 0 otherwise. Instead 
of RRMSE{X,X) we now calculate RRMSE{X^, X^). This quantity measures 
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our reconstruction only on the unseen matrix entries Xij, and is thus not influenced 
by overfltting. In Table|2]we performed exactly the same simulation as in Table [T]but 
with RRMSE{X^, X^). The results of OptSpace, SVT and SVLS stay similar to 
the results in Table[T]and our RRMSE loss function does not show overfltting. 


NR 

d 

r 

SVLS 

OptSpace 

SVT 

10"^ 

120156 

10 

0.006 (0.006) 

0.004 (0.004) 

0.0074 (0.0073) 

10“^ 

120156 

10 

0.065 (0.064) 

0.045 (0.044) 

0.051 (0.05) 

1 

120156 

10 

0.619 (0.612) 

0.49 (0.49) 

0.52 (0.51) 


Table 2; RRMSE only on the unknown measurements, for SVLS applied to RCMC, 
and OptSpace and SVT applied to the standard MC. Results represent average of 5 
different random matrices. The results in the parentheses are the standard RRMSE in 
Table [U 
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